The perfect solution for detecting sarcasm in tweets #not
نویسندگان
چکیده
To avoid a sarcastic message being understood in its unintended literal meaning, in microtexts such as messages on Twitter.com sarcasm is often explicitly marked with the hashtag ‘#sarcasm’. We collected a training corpus of about 78 thousand Dutch tweets with this hashtag. Assuming that the human labeling is correct (annotation of a sample indicates that about 85% of these tweets are indeed sarcastic), we train a machine learning classifier on the harvested examples, and apply it to a test set of a day’s stream of 3.3 million Dutch tweets. Of the 135 explicitly marked tweets on this day, we detect 101 (75%) when we remove the hashtag. We annotate the top of the ranked list of tweets most likely to be sarcastic that do not have the explicit hashtag. 30% of the top-250 ranked tweets are indeed sarcastic. Analysis shows that sarcasm is often signalled by hyperbole, using intensifiers and exclamations; in contrast, non-hyperbolic sarcastic messages often receive an explicit marker. We hypothesize that explicit markers such as hashtags are the digital extralinguistic equivalent of nonverbal expressions that people employ in live interaction when conveying sarcasm.
منابع مشابه
Detecting Sarcasm on Twitter: A Behavior Modeling Approach by Ashwin Rajadesingan A Thesis Presented in Partial Fulfillment of the Requirement for the Degree Master of Science Approved September 2014 by the Graduate Supervisory Committee: Huan Liu, Chair
Sarcasm is a nuanced form of language where usually, the speaker explicitly states the opposite of what is implied. Imbued with intentional ambiguity and subtlety, detecting sarcasm is a difficult task, even for humans. Current works approach this challenging problem primarily from a linguistic perspective, focussing on the lexical and syntactic aspects of sarcasm. In this thesis, I explore the...
متن کامل"Having 2 hours to write a paper is fun!": Detecting Sarcasm in Numerical Portions of Text
Sarcasm occurring due to the presence of numerical portions in text has been quoted as an error made by automatic sarcasm detection approaches in the past. We present a first study in detecting sarcasm in numbers, as in the case of the sentence ‘Love waking up at 4 am’. We analyze the challenges of the problem, and present Rulebased, Machine Learning and Deep Learning approaches to detect sarca...
متن کاملWho cares about Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analysis
Sarcasm is a common phenomenon in social media, and is inherently difficult to analyse, not just automatically but often for humans too. It has an important effect on sentiment, but is usually ignored in social media analysis, because it is considered too tricky to handle. While there exist a few systems which can detect sarcasm, almost no work has been carried out on studying the effect that s...
متن کاملAn Empirical, Quantitative Analysis of the Differences Between Sarcasm and Irony
A variety of classification approaches for the detection of ironic or sarcastic messages has been proposed in the last decade to improve sentiment classification. However, despite the availability of psychologically and linguistically motivated theories regarding the di↵erence between irony and sarcasm, these typically do not carry over to a use in predictive models; one reason might be that th...
متن کاملSignaling sarcasm: From hyperbole to hashtag
To avoid a sarcastic message being understood in its unintended literal meaning, in microtexts such as messages on Twitter.com sarcasm is often explicitly marked with a hashtag such as ‘#sarcasm’. We collected a training corpus of about 406 thousand Dutch tweets with hashtag synonyms denoting sarcasm. Assuming that the human labeling is correct (annotation of a sample indicates that about 90% o...
متن کامل